Skip to content

Python TAP test conversion using pytest#2

Closed
adunstan wants to merge 87 commits into
masterfrom
pytap
Closed

Python TAP test conversion using pytest#2
adunstan wants to merge 87 commits into
masterfrom
pytap

Conversation

@adunstan

@adunstan adunstan commented Jun 6, 2026

Copy link
Copy Markdown
Owner

Conversion done by Claude Code

adunstan added 30 commits June 6, 2026 08:12
A ctypes binding of libpq (bindings, constants, OIDs, library discovery,
notification and result handling) plus a Session class providing synchronous,
asynchronous, pipeline, COPY-free NOTIFY/notice and non-blocking query
execution.  This lets the Python test suite run SQL in-process without forking
psql.
PostgresServer manages a cluster's lifecycle (initdb, start/stop/restart,
promote), configuration, in-process SQL, log inspection, backup/streaming/
archiving/restore, WAL helpers, replication-slot helpers and connect_ok/
connect_fails connection assertions.  PgBin runs client programs; the fixtures
(pg_bin, create_pg, pg, conn, bindir, libdir) build the common test objects and
tear them down automatically.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
pgtap is a pytest plugin that emits TAP for the meson/prove harness when
TESTLOGDIR is set, and maps a whole-module skip to success.  The repository
pyproject.toml carries the pytest configuration.  meson gains a pytest feature
option and a kind=='pytest' test branch, so each directory can list pytest
suites beside its tap suites.  Includes the suite's own self-tests.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Helpers used by the heavier test suites: a pg_regress runner, an OpenSSL-backed
SSL server configurator, an slapd launcher, a stand-alone Kerberos KDC, and a
launcher for the mock OAuth provider.
initdb, pg_ctl, pg_controldata, pg_resetwal, pg_config, pg_test_fsync,
pg_test_timing, pg_archivecleanup, pg_waldump, pg_walsummary and scripts.
pg_basebackup, pg_rewind, pg_verifybackup, pg_combinebackup, pg_checksums,
pg_amcheck and amcheck.
bin/pg_dump, bin/pg_upgrade and the test_pg_dump module.  pg_upgrade delegates
dump adjustment to a small Perl CLI wrapper (pyt/adjust_dump.pl).
auto_explain, basebackup_to_shell, bloom, dblink, oid2name, pg_prewarm,
pg_stash_advice, pg_stat_statements, pg_visibility, test_decoding, vacuumlo,
postgres_fdw and sepgsql.
Includes the test_checksums DataChecksums helper (pyt/conftest.py) and the
injection-point-gated suites (test_aio, test_misc, xid_wraparound, ...).
ssl, ldap, ldap_password_func, kerberos, authentication, oauth_validator,
libpq-oauth and ssl_passphrase_callback.  These use the SSL/LDAP/Kerberos/OAuth
helpers and skip cleanly where a daemon or build feature is unavailable.
interfaces/libpq, interfaces/ecpg/preproc, bin/psql (tab-completion and pager
driven via pexpect), bin/pgbench, test/icu and tools/pg_bsd_indent.
Document the Python port of the Perl TAP suite: layout, how to run the
tests under meson and directly with pytest, the shared fixtures, and the
PostgresServer/Session/PgBin framework classes.
The in-process libpq layer loads libpq into the test interpreter via
ctypes, which is sensitive to two things the Perl TAP suite never hits
because it execs psql as a separate, matching binary:

* linux-meson-32 cross-builds a 32-bit (i386) libpq, but the CI python is
  64-bit; ctypes.CDLL() then fails every test with "wrong ELF class:
  ELFCLASS32".  Add an autouse session fixture that reads libpq's ELF
  header (without dlopen()ing it, which would abort under ASan) and skips
  the suite with a clear reason when the interpreter's ABI does not match.

* linux-meson-64 builds an AddressSanitizer-instrumented libpq.  Loading
  it into an otherwise uninstrumented python aborts with "ASan runtime
  does not come first in initial library list" (reported as exit 250 via
  testwrap).  Preload the ASan runtime for that job's test step so ASan
  initializes first; scoped to the step so the build is unaffected, and
  detect_leaks is already disabled via ASAN_OPTIONS.

Both verified against a local -fsanitize=address build.
The Python test suite is enabled in meson's 'auto' mode only when pytest is
found, so the macOS and Windows jobs were silently skipping it.  Install it
there:

* macOS (MacPorts): add py312-pytest, plus py312-pexpect for the interactive
  psql tests (macOS has ptys).
* Windows MSVC (pip) and MinGW (pacman): add pytest.  pexpect is omitted --
  it needs a pty, which Windows lacks, and the tests that want it
  importorskip.
The framework was Unix-domain-socket-only, so the suite could not run on
Windows (which is why enabling pytest there would have failed every test).
Mirror PostgreSQL::Test::Cluster instead: use Unix sockets everywhere except
Windows, where the server listens on 127.0.0.1; PG_TEST_USE_UNIX_SOCKETS
forces Unix sockets even on Windows.

PostgresServer now derives its connection host from that choice (the socket
directory, or the loopback address) and writes the matching
listen_addresses / unix_socket_directories via a shared helper used by both
init() and init_from_backup().  Unix behavior is unchanged.

Verified: unit-checked both transports; drove a server end-to-end over TCP
on Linux (connects, inet_server_addr() = 127.0.0.1); the existing Unix-mode
suite still passes.
With the framework now able to listen on TCP, give the tests that still
assume Unix-domain sockets the same platform conditions the Perl suite uses:

* Auth tests that need local-socket auth methods (001 password, 002 saslprep,
  003 peer, 004 file inclusion, 006 login trigger) get a module-level skipif
  on Unix sockets, mirroring "plan skip_all unless $use_unix_sockets".
* 005 SSPI is the inverse -- it runs only on Windows over TCP -- matching the
  Perl "!$windows_os || $use_unix_sockets" condition.
* The postmaster tests opened a hardcoded AF_UNIX raw socket; replace the
  three copies with PostgresServer.raw_connect() / raw_connect_works()
  (transport-aware, like PostgreSQL::Test::Cluster) and skip when raw_connect
  does not work.
* 027_nosuperuser's password_required sub-test relies on local md5 auth, so
  skip it over TCP, as the Perl test does.

The load-balance and negotiate-encryption tests already gate themselves
(framework binding limitation / PG_TEST_EXTRA), and the createsubscriber test
passes node.host as --socketdir exactly as the Perl test does; only stale
"Unix-socket-only" comments are corrected there and in 036_sequences.

Verified: the auth and postmaster suites pass on Unix (SSPI skips with the
right reason); raw_connect() works over TCP in a simulated-Windows run.
The skip notes claimed the framework is unix-socket-only and always writes
listen_addresses = ''.  That is no longer true now that PostgresServer can
listen on TCP (on Windows).  Reword to reflect that the remaining blocker is
the lack of per-node binding to distinct loopback IPs (own_host), which is why
the test still skips even with TCP available.  Comment-only.
004_load_balance_dns needs three servers bound to distinct loopback addresses
(127.0.0.1/2/3) on the *same* TCP port, fronted by one DNS name -- a topology
PostgreSQL::Test::Cluster builds with own_host => 1 plus an explicit shared
port.  The Python framework had no equivalent, so the test was skipped
unconditionally even where the Perl test runs.

Add it:

* PostgresServer takes a listen_host; when set it binds that loopback address
  over TCP (listen_addresses = '<ip>') regardless of the platform default.
* create_pg gains port= (pin an explicit port) and own_host= (assign
  127.0.0.1, .2, .3, ... from a per-test counter), mirroring own_host and
  $last_host_assigned.
* prepare_environment() now defaults PGDATABASE=postgres, as
  PostgreSQL::Test::Cluster does, so a connection string without an explicit
  dbname (as in this test) does not fall through to the OS user name.

004_load_balance_dns now runs its three nodes with own_host and a shared port;
the unconditional framework-gap skip is removed (it still skips without
PG_TEST_EXTRA=load_balance, the Linux/Windows requirement, or the prepared
hosts file).

Verified: the full test passes on a host with the /etc/hosts entries and
PG_TEST_EXTRA=load_balance; framework/auth/postmaster/load-balance suites
still pass (SSPI skips).
The pg_combinebackup and pg_upgrade tests read their copy mode with
os.environ.get("PG_TEST_PG_COMBINEBACKUP_MODE", "--copy").  Python's get()
only uses the default when the key is absent, so a key that is *set but empty*
yields "".  On the linux-meson-64 CI job the shared environment step writes
PG_TEST_PG_COMBINEBACKUP_MODE= and PG_TEST_PG_UPGRADE_MODE= (empty), so mode
became "" and an empty argument was passed to pg_combinebackup/pg_upgrade --
e.g. pg_combinebackup treated "" as an extra input backup directory and failed
with: could not open version file "/PG_VERSION".

The Perl tests use `$ENV{...} || '--copy'`, which falls back on the empty
string too.  Mirror that with `os.environ.get(...) or "--copy"` at all twelve
mode sites.  Also harden the two PG_TEST_TIMEOUT_DEFAULT readers the same way
(int(os.environ.get(...) or "180")), since a set-but-empty value there would
raise ValueError at import.

These failures only surfaced now because linux-meson-64 (AddressSanitizer) is
the one job that actually runs the pytest suite, and only after the earlier
LD_PRELOAD fix stopped the whole suite aborting at startup.

Verified: with PG_TEST_PG_COMBINEBACKUP_MODE='' / PG_TEST_PG_UPGRADE_MODE=''
the full pg_combinebackup and pg_upgrade suites pass (previously 9 errored).
Those jobs build with -Dauto_features=disabled, which turns the pytest
feature (default 'auto') off, so meson never enabled the suite and every
Python test was reported as "pytest not enabled" -- hundreds of skips on
macOS and Windows while the same tests run on Linux.

Add -Dpytest=enabled to MESON_COMMON_FEATURES (macOS, Windows MinGW) and to
the Windows MSVC job's MESON_FEATURES.  All three already install pytest, so
'enabled' is safe and additionally fails loudly if that install regresses.
…conns

wait_for_catchup polls pg_stat_replication with a per-row query and
WHERE application_name IN ('<standby>', 'walreceiver'); poll_query_until
then requires the output to equal exactly "t".  When two connections match
-- e.g. a logical subscriber named <standby> alongside a physical standby
reporting application_name = 'walreceiver', as in
subscription/test_038_walsnd_shutdown_timeout -- the query returns "t\nt",
which never equals "t", so the wait times out even though both connections
have caught up.

Aggregate the per-row condition with bool_and so the query always yields a
single row.  The result is unchanged when only one connection matches, and
correctly requires all matching connections to have caught up when several
do.  (The same latent bug exists in PostgreSQL::Test::Cluster, fixed
separately.)
The pytest suite should read on its own terms, not as commentary on the
Perl suite it was ported from.  Remove "mirrors PostgreSQL::Test::*",
"as in Perl", and similar references from comments and docstrings, keeping
the behavioral explanation itself.
…back

Supersedes the earlier bool_and approach (commit 4691246, a squash
candidate).  wait_for_catchup matched application_name IN ('<standby>',
'walreceiver'); the 'walreceiver' alternative is needed for standbys that
connect without setting application_name (e.g. a primary_conninfo generated by
pg_rewind/pg_basebackup --write-recovery-conf, as in pg_rewind
test_007_standby_source).  But the IN clause returns two rows when a named
connection coexists with a separate 'walreceiver' connection -- e.g. the
logical subscriber test_sub alongside a physical standby in
subscription/test_038_walsnd_shutdown_timeout -- giving "t\nt", which never
equals poll_query_until's "t", so the wait spuriously times out.

Match the requested name, and fall back to 'walreceiver' only when no
connection with that name exists.  Also give the pg_rewind conftest standby an
explicit application_name so it is matched by name, and assert in the
replication self-test that a streaming standby reports its node name.
Under parallel test execution (meson --num-processes), several initdb
processes can race creating a shared temp ancestor directory.  initdb's
pg_mkdir_p is not tolerant of that race -- when its stat() does not see the
directory but a concurrent process creates it before the following mkdir(),
mkdir() fails with EEXIST and pg_mkdir_p treats it as fatal.  On the Windows CI
this failed a large number of pytest tests with:

    initdb: error: could not create directory "...\\pytest-of-runneradmin": File exists

Create the (empty) data directory in the framework first, using Python's
makedirs (which does tolerate the concurrent-create race).  initdb then takes
its "present but empty" path and never calls pg_mkdir_p.  (The underlying
pg_mkdir_p race is a separate, core fix.)
test_010_pg_basebackup created a file with a non-UTF8 name to test backing up
such files.  macOS (like some Windows code pages) rejects the name, and the
unconditional open() raised OSError.  Wrap it so we quietly proceed without
that coverage when the filesystem refuses the name.

test_002_compare_backups placed its tablespace under the per-test tmp_path.
Those tablespace symlinks are written into a base backup's tar stream, whose
target length is limited (~100 bytes); the deep tmp_path layout (very long on
macOS) overflowed it, failing with "symbolic link target too long for tar
format".  Add a tempdir_short fixture -- a directory directly under the system
temp area -- and use it for the tablespace locations.
test_042_low_level_backup copies a running primary's data directory with
pg_backup_start() held open.  That races with the server: a file present when
a directory is scanned (e.g. a pg_wal/archive_status flag) can be gone before
it is copied, and shutil.copytree then raised FileNotFoundError -- failing on
macOS, where the timing made the race reliable.

Add a copy_live_tree() helper that recursively copies but silently skips
entries that disappear mid-copy (and recreates symlinks), as a low-level
backup must, and use it instead of shutil.copytree.
pg_mkdir_p creates each missing path component with a stat() followed by
mkdir().  If the stat() reports the component as absent but another process
creates it in the window before this process's mkdir(), mkdir() fails with
EEXIST and pg_mkdir_p treated that as a hard error -- unlike "mkdir -p", which
is meant to be idempotent and race-tolerant.

This shows up when several processes concurrently create paths that share an
ancestor directory: for example, parallel initdb runs whose data directories
live under a common temporary directory.  One process wins the race to create
the shared ancestor and the others fail with

    could not create directory "...": File exists

It is more easily hit on Windows, where stat() of a directory undergoing
concurrent creation can transiently fail, but the race exists everywhere.

After a failing mkdir(), accept the result when errno is EEXIST and the path
now exists as a directory; only then is the failure genuine.
…AP harness

The pytest framework fixtures added in commit 714445b placed their working
directories under pytest's tmp_path, i.e. the shared pytest-of-<user> base.
Under meson each test file runs as its own pytest process, and pytest
concurrently creates and rotates numbered directories beneath that shared base.
On Windows that churn makes stat()/mkdir() on the shared ancestor unreliable,
so directory creation (e.g. by initdb) races and fails (after the pg_mkdir_p
fix, as "Permission denied").

The TAP harness avoids this by using the per-test directory meson's testwrap
provides in TESTDATADIR (PostgreSQL::Test::Utils sets tmp_check to it).  Add a
shared test_datadir fixture that returns TESTDATADIR when set, falling back to
tmp_path for a standalone pytest run, and have create_pg, ldap_server,
kerberos and ssl_server use it.  Each test then gets its own, un-churned
directory and parallel processes no longer contend on a common ancestor.
adunstan added 28 commits June 9, 2026 17:25
…07_catcache_inval

The test started "SELECT foofunc(1)" asynchronously (it pauses on the
catcache-list-miss injection point), then immediately invalidated and woke the
point.  do_async returns as soon as the query is sent, so in-process the wakeup
could run before the backend reached the point, failing with "could not find
injection point ... to wake up".  The TAP test only avoids this by accident:
the latency of spawning psql for the intervening CREATE FUNCTION gives the
backend time to arrive, latency the in-process layer does not have.  Wait
explicitly for the backend to be paused at the injection point first.
The framework cached one libpq Session per node (added with the framework in
714445b) and routed every safe_sql/sql call and the `conn` fixture through
it.  That conflated logically separate sessions onto one connection, so they
shared session state -- GUCs, search_path, temp tables, transaction state --
which is semantically wrong, and it was the source of two flake classes: a
connection left stale by a crash/restart was silently reused, and operations
that the v13 reference paced with a fresh connection per safe_psql ran
back-to-back here, exposing timing races.

Sessions are cheap, so open a fresh one per call instead, matching the v13
PostgreSQL::Test::Session model:

  * safe_sql/sql open a short-lived connection, run the statement and close it
    (independent session per call);
  * session() returns a fresh persistent Session the caller owns;
  * the conn fixture opens its own Session and closes it at teardown.

Drop the _sessions cache and _close_sessions plumbing.  Verified across ~175
tests locally (direct session() users, conn users, crash/injection-point
tests); none relied on the previous shared state.
Commit f1585f9 stopped caching a shared per-node libpq session, so
safe_sql/sql now open and close a fresh connection per call.  Fifteen
converted tests relied on the old shared connection and failed once it
was removed.  Fix each to the intended semantics without a shared cache:

- Remove stale references to the deleted PostgresServer._sessions dict
  and _close_sessions() method.  safe_sql now leaves nothing connected to
  a database after each call, so the pre-DROP/CREATE DATABASE session
  eviction is unnecessary (test_006_db_file_copy, test_011_generated,
  test_006_logical_decoding, test_030_stats_cleanup_replica,
  test_031_recovery_conflict, test_032_relfilenode_reuse,
  test_003_start_stop).

- In-place tablespace creation needs allow_in_place_tablespaces set in
  the same session as CREATE TABLESPACE, but CREATE TABLESPACE cannot run
  in a transaction block (and a multi-statement string is one implicit
  transaction).  Run the SET and the CREATE as separate statements on one
  persistent connection (test_002_tablespace, test_010_pg_basebackup,
  test_011_in_place_tablespace, test_012_ddlutils,
  test_033_replay_tsp_drops, test_002_pg_dump).

- test_006_login_trigger counted login-trigger firings and relied on a
  safe_sql not opening a new connection.  Fold the "mallory never logged
  in" check into the preceding connect_ok via a new stdout_unlike option
  so the count stays correct.

- test_009_twophase set synchronous_commit on the shared session before a
  COMMIT PREPARED issued while the synchronous standby was down; losing
  the GUC made COMMIT PREPARED wait forever.  Issue the SET and the
  dependent statements on one persistent connection.
Three authentication-related pytest failures on the Windows CI jobs, none of
which is actually an environment-visibility problem (the ucrt-based VS and
UCRT64 MinGW libpq builds both read os.environ as CPython sets it):

- connect_fails waited for a "forked new client backend, pid=N socket=..."
  log record paired with the matching backend-exit record.  Windows
  (EXEC_BACKEND) never logs the fork record, so the wait timed out.  Wait for
  the backend-exit record alone there; connect_fails issues one connection at
  a time, so the next exit after the recorded offset is the one we triggered.
  (authentication/test_001_password)

- test_003_peer probes whether peer auth is supported by connecting once and
  checking the server log for "peer authentication is not supported on this
  platform".  Under the in-process libpq layer that probe connection raises
  (peer auth fails) before the log is inspected, so the test errored instead
  of skipping.  Tolerate the connection error and decide from the log.

- test_002_saslprep used os.environb, which does not exist on Windows, and an
  environment variable cannot carry byte-exact non-ASCII passwords to libpq on
  Windows anyway (psql has no UTF-8 active-code-page manifest, so getenv
  re-encodes through the process code page).  Deliver the password through a
  password file instead: libpq reads the file content verbatim, so the exact
  bytes reach the SCRAM exchange on every platform.
…service

The service-file tests embed file paths in connection strings as a
servicefile= keyword value.  libpq treats backslash as an escape character
when parsing a connection string, so a Windows path like
C:\...\pg_service_valid.conf was mangled to C:...pg_service_valid.conf and the
file was reported "not found".  Forward slashes are a valid path separator on
Windows for files, environment values and libpq's own servicefile bookkeeping,
so use them for every path the fixture hands out.

Verified on Windows: the three servicefile= tests that failed with a mangled
path now pass.
log_position() returned the log file's byte size (os.path.getsize), but
log_content() reads the file in text mode, normalising CRLF to LF.  The offset
is then used to slice log_content()[offset:] in log_check/log_contains/
wait_for_log.  On Windows the log has CRLF line endings, so the byte offset is
larger than the position in the normalised text and the slice skips past the
lines being checked -- log_check would miss a "connection authenticated:..."
record that is plainly in the log.  On Unix (LF only) byte size equals the
normalised length, so the bug was invisible there.

Return the character length of log_content() instead, so the offset is a
position in the same normalised text it is used to slice.

Verified on Windows against a matching build: authentication/test_001_password
now passes (it previously failed log_check at the md5 connection).
…_001

The SYSTEM_USER parallel-workers check opened an in-process libpq connection
as scram_role relying on PGPASSWORD from the environment.  The in-process
library does not portably read the environment (which is why connect_ok and
connect_fails shell out to psql), so on some platforms no password was sent
and the connection failed with "password authentication failed".  Pass the
password explicitly on the connection string instead.
…vice

The service-file tests connected in-process through a libpq Session.  The
in-process library does not portably read the environment, and on the Windows
CI runner those in-process AF_UNIX connections fail at the Winsock layer
("Network is down", WSAENETDOWN) -- so all six service tests failed there even
though they pass against a local build.

Run the connections with a psql subprocess instead, exactly as the Perl
original (and every other auth test in the suite) does: psql inherits
PGSERVICE / PGSERVICEFILE / PGSYSCONFDIR from the environment, connects with
the service connection string verbatim (no host/port prepended), and exposes
the resolved service file as the :SERVICEFILE variable for the servicefile
check.

Verified on Linux and on a matching Windows build: 6/6 pass.
…trol_path

The extension_control_path test hardcoded the Unix path-list separator ":"
and used the directory paths verbatim.  On Windows the GUC separator is ";"
(":" collides with drive letters), pg_available_extensions reports the
canonicalized path with forward slashes, and backslashes in the
postgresql.conf string value must be doubled so the configuration parser
preserves them.  Mirror the original test's Windows handling.

Verified on Linux and on a matching Windows build.
append_to_file opened the file in text mode, so on Windows it turned each
"\n" into "\r\n".  For files with a strict line format that corrupts them:
recovery/test_042_low_level_backup writes the backup_label returned by
pg_backup_stop(), and the mangled CRLF made the server reject it with "invalid
data in file backup_label".  Open with newline="" so the text is written
verbatim; configuration files (the other callers) are unaffected since they
parse fine with LF on Windows.

Verified on Linux and on a matching Windows build: test_042 passes.
…aldump

pg_waldump splits a WAL-file path on "/" to separate the directory from the
start segment, so a Windows path built with os.path.join (backslashes) was
"could not locate"d.  Build the WAL path with forward slashes, matching the
Perl tests.

Also use a short tablespace location (tempdir_short) in test_001_basic, as the
Perl does, instead of a deep path under the data directory.

Verified on Linux and a matching Windows build: test_002_save_fullpage passes;
test_001_basic gets past the WAL-path failure that fails it on CI.
…l test

basebackup_to_shell.command is stored in postgresql.conf and run by the
server.  The test built it with the raw GZIP_PROGRAM path and the backup
directory, whose backslashes on Windows were mangled, so the command failed
and the backup aborted ("backend exiting before pg_backup_stop was called").
Forward-slash the gzip program (as the Perl test does) and the backup path in
the command string.

Verified on Linux and a matching Windows build.
The test passes database/role names containing high-bit, non-UTF-8 byte
sequences as subprocess arguments.  The Perl test passes them via a narrow
(ANSI) process spawn and runs everywhere except MSYS2.  Python's subprocess
always uses CreateProcessW (wide): a bytes arg must be valid UTF-8 (it is not
here), and a str arg is converted to the child's argv through the active code
page, which cannot represent the 0x80-0x9F range at all.  There is no portable
way to pass these bytes as a child's arguments under Python on any Windows, so
skip there with a clear reason.
test_002_compare_backups repoints a tablespace by removing the pg_tblspc/<oid>
link and recreating it.  On Windows that link is a directory junction, which
os.remove cannot delete -- it fails with "Access is denied" (WinError 5).  Add
a remove_dir_symlink() helper (os.rmdir for the Windows junction, os.unlink for
a POSIX symlink, the counterpart to dir_symlink) and use it.

Verified on Linux; the matching Windows build cannot fully exercise tablespace
tests due to an unrelated junction-stat issue in that local build, but the CI
failure was exactly the os.remove "Access is denied" this fixes.
Two Windows problems in test_001_start_stop:

- unix_socket_directories was written to postgresql.conf with the raw
  short_tempdir() path, whose backslashes are mangled by the configuration
  parser, so the server could not create its socket and failed to start.
  Write it with forward slashes (as the Perl test does).

- The second "pg_ctl start" is expected to fail because a server is already
  running, but on Windows pg_ctl needs more than its ~2 second slop time to
  notice the running postmaster; without a wait it spuriously succeeds.  Sleep
  3 seconds first on Windows, matching the Perl test.

Verified on Linux and a matching Windows build.
The test creates a nondeterministic ICU collation and skipped only when a
catalog query found no ICU rows.  But pg_collation can contain collprovider='i'
rows even on a server built without a usable ICU provider, so the check passed
and the test then failed at CREATE COLLATION with "ICU is not supported in this
build" (seen on the Windows builds, which set with_icu=no).  Skip on the
build's with_icu flag instead, exactly as the Perl test does.

Verified: runs and passes with with_icu=yes (Linux); skips with with_icu=no.
…lgorithm

The per-algorithm backup path is <backup_dir>/<format>/<algorithm>.
pg_basebackup creates the target directory with pg_mkdir_p, which splits the
path on "/", so the os.path.join backslash path on Windows could not have its
intermediate <format> directory created ("could not create directory ... No
such file or directory").  Build the path with forward slashes, as the Perl
test does.

Verified on Linux and a matching Windows build.
pgbench echoes each script's path back ("script N: <path>" / "type: <path>")
and the expectations match it with ".*/<name>".  The --file argument was built
with os.path.join, so on Windows the backslash path did not match and many
custom-script subtests failed.  Build the --file path with forward slashes.

Verified on Linux (105 passed); fixes the script-path-pattern subtests.
…heck

check_pgbench_logs validates the log file paths with a regex anchored on
"/<prefix>.<pid>", but the paths come from os.path.join, which uses backslashes
on Windows, so the "file name format" check found zero matches there.  Accept
either separator in the regex.

Verified on Linux (105 passed) and a matching Windows build (logs_sampling /
logs_contents now pass).
The keylog subtest passes sslkeylogfile=<path> in the connection string with an
os.path.join path; libpq treats backslashes as escapes, so on Windows the
keylog file was written to a mangled path and the "keylog file exists" check
failed.  Build the path (and the basedir used for the invalid-path case) with
forward slashes.

Verified on Linux and a matching Windows build (with PG_TEST_EXTRA=ssl).
…ackup restore

_restore_node copied a plain backup with shutil.copytree(symlinks=True), then
relocated the tablespace by following the pg_tblspc/<oid> link.  On Windows
copytree does not preserve a directory junction, so the tablespace contents
were copied into pg_tblspc/<oid> as a real directory, and the subsequent link
removal failed ("directory is not empty").  When the entry is no longer a link,
move that directory to the destination instead; the POSIX symlink path is
unchanged.

Verified on Linux; the Windows path is exercised on CI (the local Windows build
cannot run tablespace tests due to an unrelated junction-stat issue).
The invalid-magic subtest copies a WAL file into a "broken_wal" directory and
runs pg_waldump on it.  That path was built with os.path.join, so on Windows
pg_waldump (which locates a WAL file by splitting the path on "/") could not
find it.  Build it with forward slashes, like the other WAL paths in this file.

Verified on Linux; same pattern as the already-working _wal_path fix.
pg_upgrade never uses Unix sockets on Windows (the socket setup in
src/bin/pg_upgrade/server.c is guarded by #if !defined(WIN32)); it
connects to the clusters it starts over localhost TCP.  But under
PG_TEST_USE_UNIX_SOCKETS the suite initializes clusters with
listen_addresses='', so pg_upgrade could not connect to them at all.

Add a pypg.util.enable_localhost_tcp() helper (a no-op off Windows) and
call it for every cluster handed to pg_upgrade in the pg_upgrade tests.

The helper appends listen_addresses='localhost' rather than the literal
'127.0.0.1' so the server binds exactly what libpq resolves.  pg_upgrade
passes no host on Windows, so libpq uses its default (localhost), which on
an IPv6-enabled host resolves to ::1 first.  Binding only 127.0.0.1 leaves
that ::1 candidate refused, and pg_upgrade's parallel task framework waits
on the connection socket with select() but no exception set -- so on
Windows the refused async connect is never reported and pg_upgrade hangs
forever.  Listening on "localhost" covers every address the client may
try (and degrades to just 127.0.0.1 where IPv6 is unavailable).
…ndows

The pg_waldump basic test feeds --path the pgdata directory and, for the
archive scenarios, the pg_wal.tar / pg_wal.tar.gz paths built with
os.path.join (backslashes on Windows).  pg_waldump splits the path on "/",
so the backslash form made it fail to open the archive.  Normalize the
scenario path to forward slashes, matching the other WAL path fixes.
…on Windows

The open_file_fails, open_directory_fails and search_directory_fails
scenarios make a file or directory unreadable with chmod(0) and expect
pg_verifybackup to report it.  Windows ignores those mode bits, so the
backup still verifies and the assertion that it should fail does not hold.
Skip these scenarios on Windows (and cygwin), mirroring the skip condition
the original test uses.  A comment already documented this intent, but the
skip itself was never implemented.
…s 004 and 006

test_004_subscription and test_006_transfer_modes invoked pg_upgrade through
a node's command_*/pg_bin helpers, which inject PGHOST=<node Unix socket dir>
into the environment.  On Windows pg_upgrade never uses Unix sockets
(cluster->sockdir is NULL there), so it relies on libpq's default host -- but
the leaked PGHOST overrode that default and pointed pg_upgrade at the node's
socket directory instead of localhost TCP, giving "connection to server on
socket ... failed: Connection refused".  Run pg_upgrade via the bare pg_bin
fixture (no PGHOST), as the other pg_upgrade tests already do.
…on Windows

The test writes a file with a deliberately non-UTF8 name to exercise backup
of such files.  On some Windows Python builds os.path.join() of the byte
path raises UnicodeDecodeError itself (in ntpath), before open() is even
reached -- but the join sat outside the try/except.  Move it inside so the
best-effort coverage is skipped cleanly where the path cannot be formed.
…pall test

The restore_tablespace case matched the dumped CREATE TABLESPACE LOCATION
against re.escape() of the path used to create it.  The path is built with
os.path.join over a forward-slashed test directory, yielding a mixed-separator
string, while the server canonicalizes the stored location -- so on the MinGW
build the dumped path no longer matched the expected one.  Build the location
pattern with a separator class that accepts "/", "\" or "\\" between path
components, so it matches whatever form the server dumps on any platform.
@adunstan adunstan closed this Jun 12, 2026
@adunstan adunstan deleted the pytap branch June 12, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant